Skip to main content

FICO Explainable ML: HELOC

Introduction

In this walkthrough, we'll use Xplainable's platform to analyze the FICO Home Equity Line of Credit (HELOC) dataset. This dataset is commonly used to predict credit risk, helping to identify borrowers who may be at risk of defaulting on their line of credit.

We'll go step-by-step through Xplainable's features, starting with data preprocessing, moving on to model training, and finally, interpreting the results. The goal is to show how Xplainable can simplify the process of building and understanding machine learning models, providing clear insights without the need for complex coding or deep statistical knowledge. By the end of this walkthrough, you'll see how Xplainable can help you quickly draw meaningful conclusions from the FICO HELOC dataset.

Install and import relevant packages

!pip install xplainable
!pip install altair==5.4.1 #Upgrade this to work in Google Colab
!pip install xplainable-client
!pip install kaggle
import pandas as pd
from sklearn.model_selection import train_test_split
import json
import requests

import xplainable_client
import xplainable as xp
from xplainable.core import XClassifier
from xplainable.core.optimisation.bayesian import XParamOptimiser

1. Import CSV and Perform Data Processing

import os
import pandas as pd

def load_heloc_data(
use_kaggle=True,
kaggle_dataset_path="./data/heloc_dataset.csv",
local_path='./data/heloc_dataset.csv'):
# Check if using Kaggle API
if use_kaggle:
try:
# Import the Kaggle API
from kaggle.api.kaggle_api_extended import KaggleApi

# Initialize and authenticate the Kaggle API
api = KaggleApi()
api.authenticate()

# Download the dataset
print("Downloading dataset from Kaggle...")
api.dataset_download_files(kaggle_dataset_path, path='./data', unzip=True)

# Load the dataset
data = pd.read_csv('./data/heloc_dataset.csv')
print("Dataset loaded from Kaggle.")
return data

except Exception as e:
print(f"Error downloading dataset from Kaggle: {e}. Falling back to local version.")

# Load from the local path if Kaggle download fails or is not selected
try:
data = pd.read_csv(local_path)
print("Dataset loaded from local path.")
return data
except FileNotFoundError:
print("Local file not found. Please check the file path.")

# Usage
data = load_heloc_data(use_kaggle=True) # Set to False to skip Kaggle and use the local file
data.head()

RiskPerformanceExternalRiskEstimateMSinceOldestTradeOpenMSinceMostRecentTradeOpenAverageMInFileNumSatisfactoryTradesNumTrades60Ever2DerogPubRecNumTrades90Ever2DerogPubRecPercentTradesNeverDelqMSinceMostRecentDelq...PercentInstallTradesMSinceMostRecentInqexcl7daysNumInqLast6MNumInqLast6Mexcl7daysNetFractionRevolvingBurdenNetFractionInstallBurdenNumRevolvingTradesWBalanceNumInstallTradesWBalanceNumBank2NatlTradesWHighUtilizationPercentTradesWBalance
0Bad551444842030832...4300033-881169
1Bad61581541244100-7...670000-80-8-80
2Bad6766524900100-7...44044536642186
3Bad6616917328119376...57054728364391
4Bad81333271321200100-7...25011518931080
Out:

Error downloading dataset from Kaggle: Could not find kaggle.json. Make sure it's located in /Users/jtuppack/.kaggle. Or use the environment method. See setup instructions at https://github.com/Kaggle/kaggle-api/. Falling back to local version.

Dataset loaded from local path.

Where the defition of each of the fields are below:

Variable NamesDescription
RiskPerformancePaid as negotiated flag (12-36 Months). String of Good and Bad
ExternalRiskEstimateConsolidated version of risk markers
MSinceOldestTradeOpenMonths Since Oldest Trade Open
MSinceMostRecentTradeOpenMonths Since Most Recent Trade Open
AverageMInFileAverage Months in File
NumSatisfactoryTradesNumber of Satisfactory Trades
NumTrades60Ever2DerogPubRecNumber of Trades 60+ Ever
NumTrades90Ever2DerogPubRecNumber of Trades 90+ Ever
PercentTradesNeverDelqPercent of Trades Never Delinquent
MSinceMostRecentDelqMonths Since Most Recent Delinquency
MaxDelq2PublicRecLast12MMax Delinquency/Public Records in the Last 12 Months. See tab 'MaxDelq' for each category
MaxDelqEverMax Delinquency Ever. See tab 'MaxDelq' for each category
NumTotalTradesNumber of Total Trades (total number of credit accounts)
NumTradesOpeninLast12MNumber of Trades Open in Last 12 Months
PercentInstallTradesPercent of Installment Trades
MSinceMostRecentInqexcl7daysMonths Since Most Recent Inquiry excluding the last 7 days
NumInqLast6MNumber of Inquiries in the Last 6 Months
NumInqLast6Mexcl7daysNumber of Inquiries in the Last 6 Months excluding the last 7 days. Excluding the last 7 days removes inquiries that are likely due to price comparison shopping.
NetFractionRevolvingBurdenThis is the revolving balance divided by the credit limit
NetFractionInstallBurdenThis is the installment balance divided by the original loan amount
NumRevolvingTradesWBalanceNumber of Revolving Trades with Balance
NumInstallTradesWBalanceNumber of Installment Trades with Balance
NumBank2NatlTradesWHighUtilizationNumber of Bank/National Trades with high utilization ratio
PercentTradesWBalancePercent of Trades with Balance

Seperate data into target (y) and features (x)

y = data['RiskPerformance']
x = data.drop('RiskPerformance',axis=1)

Create test and train datasets

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42, stratify=y)

2. Model Optimisation

Xplainable's XParamOptimiser fine-tunes the hyperparameters of our model. This produces the most optimal parameters that will result in the best model performance.

y_train_df = pd.Series(y_train)

optimiser = XParamOptimiser(metric='f1-score',n_trials=300, n_folds=2, early_stopping=150)
params = optimiser.optimise(x_train, y_train_df)
Out:

74%|███████▍ | 223/300 [00:14<00:04, 15.92trial/s, best loss: -0.7057194555537591]

3. Model Training

The XClassifier is trained on the dataset, with the optimised parameters.

model = XClassifier(**params)
model.fit(x_train, y_train)
Out:

<xplainable.core.ml.classification.XClassifier at 0x108e252a0>

4. Explaining and Interpreting the Model

Following training, the model.explain() method is called to generate insights into the model's decision-making process. This step is crucial for understanding the factors that influence the model's predictions and ensuring that the model's behaviour is transparent and explainable.

model.explain()

Analysing Feature Importances and Contributions

Click on the bars to see the importances and contributions of each variable.

Feature Importances

The relative significance of each feature (or input variable) in making predictions. It indicates how much each feature contributes to the model’s predictions, with higher values implying greater influence.

Feature Significance

The effect of each feature on individual predictions. For instance, in this model, feature contributions would show how each feature (like the net fraction of trades revolving burden) affects the predicted risk estimate for a particular applicant.

5. Saving a model to the Xplainable App

In this step, we first create a unique identifier for our HELOC risk prediction model using client.create_model_id. This identifier, referred to as model_id, represents the newly created model that predicts the likelihood of applicants defaulting on their line of credit. After creating this model identifier, we generate a specific version of the model using client.create_model_version, passing in our training data. The resulting version_id represents this particular iteration of our model, allowing us to track and manage different versions systematically.

Creat an API key in the Xplainable platform

Heloc Deployment

client = xplainable_client.Client(
api_key="add_your_own_api_key_here",
)
Out:

get_response_content

<Response [200]>

# Create a model
model_id = client.create_model_id(
model,
model_name="HELOC",
model_description="Predicting applicant risk estimates"
)
# Create a version for that model
version_id = client.create_model_version(model, model_id, x_train, y_train)
Out:

get_response_content

<Response [200]>

0%| | 0/23 [00:00<?, ?it/s]

get_response_content

<Response [200]>

Xplainable App Model View

As you can see in the screenshot, now the model has been saved to Xplainable's webapp, allowing yourself and other members in your organisation to visually analyse the model.

Heloc Deployment

6. Deployments

The code block illustrates the deployment of our churn prediction model using the xp.client.deploy function. The deployment process involves specifying the hostname of the server where the model will be hosted, as well as the unique model_id and version_id that we obtained in the previous steps. This step effectively activates the model's endpoint, allowing it to receive and process prediction requests. The output confirms the deployment with a deployment_id, indicating the model's current status as 'inactive', its location, and the endpoint URL where it can be accessed for xplainable deployments.

deployment = client.deploy(
hostname="https://inference.xplainable.io",
model_id=model_id,
version_id=version_id
)
deployment
Out:

{'deployment_id': 'Du526MrksjGMu4Np',

'status': 'inactive',

'location': 'syd',

'endpoint': 'https://inference.xplainable.io/v1/predict'}

Testing the Deployment programatically

This section demonstrates the steps taken to programmatically test a deployed model. These steps are essential for validating that the model's deployment is functional and ready to process incoming prediction requests.

  1. Activating the Deployment: The model deployment is activated using xp.client.activate_deployment, which changes the deployment status to active, allowing it to accept prediction requests.
client.activate_deployment(deployment['deployment_id'])
Out:

{'message': 'activated deployment'}

  1. Creating a Deployment Key: A deployment key is generated with xp.client.generate_deploy_key. This key is required to authenticate and make secure requests to the deployed model.
deploy_key = client.generate_deploy_key('HELOC Deploy Key', deployment['deployment_id'], 7)
  1. Generating Example Payload: An example payload for a deployment request is generated by xp.client.generate_example_deployment_payload. This payload mimics the input data structure the model expects when making predictions.
#Set the option to highlight multiple ways of creating data
option = 1
if option == 1:
body = client.generate_example_deployment_payload(deployment['deployment_id'])
else:
body = json.loads(data.drop(columns=["RiskPerformance"]).sample(1).to_json(orient="records"))
body
Out:

[{'ExternalRiskEstimate': 66.0,

'MSinceOldestTradeOpen': 322.0,

'MSinceMostRecentTradeOpen': None,

'AverageMInFile': 94.0,

'NumSatisfactoryTrades': 25.5,

'NumTrades60Ever2DerogPubRec': 2.5,

'NumTrades90Ever2DerogPubRec': 1.0,

'PercentTradesNeverDelq': 98.0,

'MSinceMostRecentDelq': 27.0,

'MaxDelq2PublicRecLast12M': -4.5,

'MaxDelqEver': 3.5,

'NumTotalTrades': 6.0,

'NumTradesOpeninLast12M': 2.5,

'PercentInstallTrades': 5.5,

'MSinceMostRecentInqexcl7days': -1.5,

'NumInqLast6M': None,

'NumInqLast6Mexcl7days': -2.0,

'NetFractionRevolvingBurden': 20.5,

'NetFractionInstallBurden': 91.5,

'NumRevolvingTradesWBalance': 8.5,

'NumInstallTradesWBalance': None,

'NumBank2NatlTradesWHighUtilization': None,

'PercentTradesWBalance': None}]

  1. Making a Prediction Request: A POST request is made to the model's prediction endpoint with the example payload. The model processes the input data and returns a prediction response, which includes the predicted class (e.g., 'No' for no churn) and the prediction probabilities for each class.
response = requests.post(
url="https://inference.xplainable.io/v1/predict",
headers={'api_key': deploy_key['deploy_key']},
json=body
)

value = response.json()
value
Out:

[{'index': 0,

'id': None,

'partition': '__dataset__',

'score': 0.4421742713512326,

'proba': 0.3200335966035672,

'pred': 'Bad',

'support': 918,

'breakdown': {'base_value': 0.4780686028445082,

'ExternalRiskEstimate': -0.018834502005451913,

'MSinceOldestTradeOpen': 0.016605591083751464,

'MSinceMostRecentTradeOpen': 0.0,

'AverageMInFile': 0.008550347872215057,

'NumSatisfactoryTrades': 0.01024150167891105,

'NumTrades60Ever2DerogPubRec': -0.021627198964509615,

'NumTrades90Ever2DerogPubRec': -0.014139475517112348,

'PercentTradesNeverDelq': 0.01801186828646687,

'MSinceMostRecentDelq': -0.0070614886012018395,

'MaxDelq2PublicRecLast12M': -0.0025265052623417955,

'MaxDelqEver': -0.01229060204294862,

'NumTotalTrades': -0.013329005986282996,

'NumTradesOpeninLast12M': 0.00315905436830356,

'PercentInstallTrades': -0.001906562609604855,

'MSinceMostRecentInqexcl7days': -0.0094064788906335,

'NumInqLast6M': 0.0,

'NumInqLast6Mexcl7days': 0.011041977442703738,

'NetFractionRevolvingBurden': 0.015861228290895455,

'NetFractionInstallBurden': -0.0072543837034777315,

'NumRevolvingTradesWBalance': -0.010989696932957581,

'NumInstallTradesWBalance': 0.0,

'NumBank2NatlTradesWHighUtilization': 0.0,

'PercentTradesWBalance': 0.0}}]

SaaS Deployment Info

The SaaS application interface displayed above mirrors the operations performed programmatically in the earlier steps. It displays a dashboard for managing the 'Telco Customer Churn' model, facilitating a range of actions from deployment to testing, all within a user-friendly web interface. This makes it accessible even to non-technical users who prefer to manage model deployments and monitor performance through a graphical interface rather than code. Features like the deployment checklist, example payload, and prediction response are all integrated into the application, ensuring that users have full control and visibility over the deployment lifecycle and model interactions.

Heloc Deployment